Picture for Moxin Li

Moxin Li

Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents

Add code
May 27, 2026
Viaarxiv icon

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

Add code
May 25, 2026
Viaarxiv icon

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

Add code
May 12, 2026
Viaarxiv icon

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

Add code
May 12, 2026
Viaarxiv icon

On Predicting the Post-training Potential of Pre-trained LLMs

Add code
May 12, 2026
Viaarxiv icon

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

Add code
Dec 08, 2025
Viaarxiv icon

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

Add code
May 26, 2025
Viaarxiv icon

Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge

Add code
May 25, 2025
Viaarxiv icon

Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment

Add code
Feb 20, 2025
Figure 1 for Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
Figure 2 for Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
Figure 3 for Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
Figure 4 for Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
Viaarxiv icon

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Add code
Feb 17, 2025
Viaarxiv icon